Cory Whitney
"2019-03-20"
Open RStudio
Help > Cheatsheets > Data Visualization with ggplot2
type ‘?’ in R console with function, package or data name
Add “R” to a search with a copy of an error message
Many talented programmers who scan the web and answer issues
R has several systems for making graphs
participants_data <- read.csv("participants_data.csv")
plot(participants_data$academic_parents)
Bar plot of number of observations of binary data related to academic parents
plot(participants_data$academic_parents, participants_data$days_to_email_response)
Boxplot of days to email response grouped by binary data related to academic parents
Use help '?' for function
?plot
Many libraries and functions for graphs in R…
ggplot2 is one of the most elegant and most versatile.
ggplot implements the grammar of graphics to describe and build graphs.
Do more and do it faster by learning one system and applying it in many places.
Learn more about ggplot2 in “The Layered Grammar of Graphics”
http://vita.had.co.nz/papers/layered-grammar.pdf
qplot: 'poor man's ggplot?
library(ggplot2)
qplot(days_to_email_response, letters_in_first_name, data = participants_data)
Scatterplot of days to email response as a function of the letters in your first name
Use help '?' for function
?qplot
Want to understand how all the pieces fit together? See the R for Data Science book: http://r4ds.had.co.nz/
Example from your data
qplot(days_to_email_response, letters_in_first_name, color=academic_parents, size=working_hours_per_day, data=participants_data)
Scatterplot of letters in your first name as a function of days to email response with colors representing binary data related to academic parents and working hours per day as bubble sizes.
Make more graphs
Example from Anderson's iris data set
qplot(Sepal.Length, Petal.Length, data=iris, color=Species, size=Petal.Width)
Scatterplot of iris petal length as a function of sepal length with colors representing iris species and petal width as bubble sizes.
Use help '?' for data
?iris
qplot accepts formula arguments such as log
plot1<-qplot(carat, price, data = diamonds)
plot2<-qplot(log(carat), log(price), data = diamonds)
Use help '?' for data
?diamonds
#Create a sample
dsmall <- diamonds[sample(nrow(diamonds), 100), ]
#Plot with different colours for color
qplot(carat, price, data = dsmall, colour = color)
#Plot with different shapes for cut
qplot(carat, price, data = dsmall, shape = cut)
Different colors and shapes
Use help '?' for function
?sample
Help on topic 'sample' was found in the following packages:
Package Library
dplyr /Users/macbook/Library/R/3.5/library
base /Library/Frameworks/R.framework/Resources/library
Using the first match ...
Set parameters manually with I()
qplot(carat, price, data = diamonds, alpha=I(0.1), colour=I("blue"))
qplot(carat, price, data = diamonds, alpha=I(0.4), colour=I("green"))
Inhibit Interpretation / Conversion of Objects
Use help '?' for function
?I
With “geom” different types of plots can be defined e.g. points, line, boxplot, path, smooth. These can also be combined in a vector.
qplot(carat,price,data=dsmall, geom="line")
qplot(carat,price,data=dsmall, geom="smooth")
qplot(carat,price,data=dsmall, geom=c("point","smooth"))
ggplot2 geom options
Use help '?' for function
?qplot
read 'Arguments' section of help file
Depending on your dataset size the smooth function will select different lines and smoothing methods.
qplot(carat,price,data=dsmall,geom=c("point","smooth"))
qplot(carat,price,data=diamonds,geom=c("point","smooth"))
With span the wiggliness of the line is controlled.
qplot(carat,price,data=dsmall, geom=c("point","smooth"), span=0.2)
Use method to specify your smoothing method
qplot(carat,price,data=dsmall,geom=c("point","smooth"),method="lm")
ggplot2 lines and smoothing options
qplot(color,price/carat,data=diamonds, geom="boxplot")
qplot(color,price/carat,data=diamonds, geom="jitter")
qplot(color,price/carat,data=diamonds, geom="jitter", alpha=I(0.1))
ggplot2 boxplots and jitter
Histograms can be displayed through geom=“histogram”.
qplot(carat, data = diamonds, geom = "density")
qplot(carat, data = diamonds, geom = "density", colour = color)
qplot(carat, data = diamonds, geom = "density", fill = color, alpha=I(0.3))
ggplot2 histograms
Use factor to subset your data.
qplot(displ, hwy, data = mpg, colour = cyl, geom=c("point","smooth"),method="lm")
qplot(displ, hwy, data = mpg, colour = factor(cyl), geom=c("point","smooth"),method="lm")
ggplot2 subset with smooth line
https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#1
Usual ggplot code
ggplot(mtcars, aes(mpg, y = hp, col = gear)) +
geom_point() +
ggtitle("My Title") +
labs(x = "the x label", y = "the y label", col = "legend title")
'Slow ggplotting' version for same plot
ggplot(data = mtcars) +
aes(x = mpg) +
labs(x = "the x label") +
aes(y = hp) +
labs(y = "the y label") +
geom_point() +
aes(col = gear) +
labs(col = "legend title") +
labs(title = "My Title")
https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#1
cor.test(participants_data$days_to_email_response, participants_data$letters_in_first_name)
Pearson's product-moment correlation
data: participants_data$days_to_email_response and participants_data$letters_in_first_name
t = -1.647, df = 13, p-value = 0.1235
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.7649469 0.1229287
sample estimates:
cor
-0.4154989
Use help '?' for function
?cor.test
ggplot(datasaurus_dozen, aes(x=x, y=y))+
geom_point()+
theme_minimal() +
transition_states(dataset, 3, 1) +
ease_aes('cubic-in-out')
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
geom_point() +
transition_states(am, transition_length = 4, state_length = 1) +
view_follow()
part_data<-select(participants_data, days_to_email_response, number_of_siblings, years_of_study, number_of_publications, letters_in_first_name, km_home_to_zef, working_hours_per_day, days_to_email_response)
cormat <- round(cor(part_data), 1)
melted_cormat <- melt(cormat)
ggplot(data = melted_cormat, aes(x=Var1,
y=Var2, fill=value)) +
geom_tile()
?pdf
?png
png(file = "cortile.png", width = 7, height = 6, units = "in", res = 300)
ggplot(data = melted_cormat, aes(x = Var1, y = Var2, fill = value)) + geom_tile() + theme(axis.text.x = element_text(angle = 45, hjust = 1))
dev.off()
list.files()
If time create and export more figures
Install Git & Github (if you do not already have them).
Git https://git-scm.com/downloads
Github http://r-pkgs.had.co.nz/git.html
join Github https://github.com/